Search CORE

9 research outputs found

Exploring OpenMP Accelerator Model in a real-life scientific application using hybrid CPU-MIC platforms

Author: Halbiniak Kamil
Lastovetsky Alexey
Szustak Lukasz
Wyrzykowski Roman
Publication venue
Publication date: 01/01/2016
Field of study

Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016.The main goal of this paper is the suitability assessment of the OpenMP Accelerator Model (OMPAM) for porting a real-life scientific application to heterogeneous platforms containing a single Intel Xeon Phi coprocessor. This OpenMP extension is supported from version 4.0 of the standard, offering an unified directive-based programming model dedicated for massively parallel accelerators. In our study, we focus on applying the OMPAM extension together with the OpenMP tasks for a parallel application which implements the numerical model of alloy solidification. To map the application efficiently on target hybrid platforms using such constructs as omp target, omp target data and omp target update, we propose a decomposition of main tasks belonging to the computational core of the studied application. In consequence, the coprocessor is used to execute the major parallel workloads, while CPUs are responsible for executing a part of the application that do not require massively parallel resources. Effective overlapping computations with data transfers is another goal achieved in this way. The proposed approach allows us to execute the whole application 3.5 times faster than the original parallel version running on two CPUs.This research was conducted with the support of COST Action IC1305 (NESUS), as well as the National Science Centre (Poland) under grant no. UMO-2011/03/B/ST6/03500. The authors are grateful to the Czestochowa University of Technology for granting access to Intel Xeon Phi coprocessors provided by the MICLAB project no. POIG.02.03.00.24-093/13 (http://miclab.pl)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Author: A Eltablawy
A Lastovetsky
A Strugarek
D Culler
J Guo
Kamil Halbiniak
L Szustak
L Szustak
Lukasz Szustak
Lukasz Szustak
Lukasz Szustak
M Ciznicki
Ondřej Jakl
P Smolarkiewicz
Roman Wyrzykowski
X Cao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor

Author: Kamil Halbiniak
Krzysztof Rojek
Lukasz Kuczynski
Lukasz Szustak
Pawel Gepner
Tomasz Olas
Publication venue: Hindawi Limited
Publication date: 01/01/2015
Field of study

The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs

Crossref

Directory of Open Access Journals

Correlation of Performance Optimizations and Energy Consumption for Stencil-Based Application on Intel Xeon Scalable Processors

Author: Lukasz Szustak
Roman Wyrzykowski
Tomasz Olas
Valeria Mele
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors

Author: Bandishti V
Bobulski J
Eltablawy A
Hager G
Jeffers J
Lukasz Szustak
Pawel Bratek
Szustak L
Vladimirov A
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Porting and optimization of solidification application for CPU–MIC hybrid platforms

Author: Adam Kulawik
Adrian H
Benito J
Colfax International
Hager G
Intel Corporation
Jeffers J
Joanna Wrobel
Kamil Halbiniak
Kulawik A
Lukasz Kuczynski
Lukasz Szustak
Szustak L
Wolfe N
Wyrzykowski R
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Large-Scale Parallelization of Human Migration Simulation

Author: Anastasiadis Petros
Arabnejad Hamid
Gogolenko Sergiy
Groen D.
Jahani Alireza
Lawenda Marcin
Papadopoulou Nikela
Szustak Lukasz
Publication venue
Publication date: 01/01/2023
Field of study

Forced displacement of people worldwide, for example, due to violent conflicts, is common in the modern world, and today more than 82 million people are forcibly displaced. This puts the problem of migration at the forefront of the most important problems of humanity. The Flee simulation code is an agent-based modeling tool that can forecast population displacements in civil war settings, but performing accurate simulations requires nonnegligible computational capacity. In this article, we present our approach to Flee parallelization for fast execution on multicore platforms, as well as discuss the computational complexity of the algorithm and its implementation. We benchmark parallelized code using supercomputers equipped with AMD EPYC Rome 7742 and Intel Xeon Platinum 8268 processors and investigate its performance across a range of alternative rule sets, different refinements in the spatial representation, and various numbers of agents representing displaced persons. We find that Flee scales excellently to up to 8192 cores for large cases, although very detailed location graphs can impose a large initialization time overhead

Chalmers Research

Performance enhancement of a dynamic K-means algorithm through a parallel adaptive strategy on multicore CPUs

Author: Ahmad
Arthur
Ball
Calinski
Cortez
Cuomo
Cuomo
de Campos
Dhar
Dhillon
Diego Romano
Dua
Duda
Frey
Gan
Giuliano Laccetti
Haase
Halkidi
Hartigan
Joshi
Kaufman
Kerdprasop
Kim
Kraus
Laccetti
Laccetti
Laccetti
Lakshmi Patibandla
Lukasz Szustak
Marco Lapegna
Mohebi
Moro
Selim
Shafeeq
Shah
Szustak
Thompson
Valeria Mele
Wang
Xu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref